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Abstract  — A  new  texture  classification  algorithm 
using  wavelet  packet  transform  is  proposed.  It  uses 
principal  component  analysis  technique  and  statis¬ 
tical  distance  measurement  to  combine  and  select 
frequency  channel  features  to  give  improved  clas¬ 
sification  performance.  Comparison  is  also  made 
between  wavelet  packet  transform  features  and 
Fourier  transform  features  on  a  set  of  eight  optical 
texture  images  with  several  level  of  white  noise 
added.  Both  algorithms  are  successfully  applied  to 
the  classification  of  under-ice  sidescan  sonar  imag¬ 
es. 

I.  INTRODUCTION 

Texture  andysis  has  found  many  important 
applications  in  such  areas  as  medical  imaging, 
computer  vision,  and  remote  sensing.  Many 
successful  algorithms  have  been  proposed  over  the 
last  few  decades.  Recently,  multichannel  analysis 
methods,  including  texture  energy  measurement  [1], 
the  eigenfilter  method  [2],  and  Garbor  filter  method 
[3],  have  been  repetitively  proved  to  perform  better 
than  other  techniques.  The  newly  developed  wavelet 
analysis  technique  [4]  [5]  provides  yet  another  useful 
framework  for  multiscale  image  processing.  The 
texture  research  community  is  currently  devoting 
considerable  effort  to  wavelet  applications  in  texture 
analysis.  Henke-Reed  and  Cheng  [6]  applied  wavelet 
transforms  to  texture  images,  using  the  energy  ratios 
between  frequency  channels  as  features.  Chang  and 
Kuo  [7]  proposed  a  tree-structured  wavelet  transform 
algorithm  for  texture  classification,  which  is  similar 
to  the  wavelet  packet  best  bases  selection  algorithm 
of  Coifman  [5].  Laine  and  Fan  [9]  used  the  wavelet 
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packet  transform  energy  measurements  directly  as 
texture  features  in  their  texture  classification 
approach. 

These  researchers  have  demonstrated  that  the 
wavelet  transform  is  a  valuable  tool  for  texture 
analysis.  However,  a  common  problem  with  these 
approaches  is  that  they  are  ail  direct  applications  of 
existing  wavelet  processing  algorithms,  which  are 
ideal  for  signal  representation  but  not  necessarily  the 
best  for  signal  discrimination.  To  fully  utilize  the 
power  of  a  wavelet  packet  transform,  new  techniques 
tailored  for  extracting  features  of  greater 
discrimination  ability  must  be  developed.  Ih  this 
paper  we  propose  the  use  of  principal  component 
analysis  technique  and  statistical  distance 
measurement  to  combine  and  select  frequency- 
channel  features  that  give  improved  classification 
performance. 

We  also  compare  this  new  approach  with  the 
Fourier  transform  texture  classification  method 
which  we  proposed  in  [9].  Just  as  the  ideal  tool  for 
nonstationary  signal  analysis  is  a  wavelet  transform, 
the  ideal  tool  for  stationary  signal  is  a  Fourier 
transform.  Since  texture  signals  are  mostly 
stationary,  we  should  expect  the  Fourier  transform  to 
generate  better  results. 

The  new  algorithms  are  tested  on  two  data  sets. 
The  first  includes  eight  types  of  natural  optical 
images  obtained  from  the  MIT  Media  Lab  Vistex 
texture  data  base.  We  hope  to  get  more  conclusive 
result  from  this  larger  classes  of  data.  For  our  real 
world  application,  we  apply  the  algorithms  on  a 
second  set  of  sidescan  sonar  images  from  a  sonar 
survey  of  an  Arctic  under-ice  canopy  [10]. 

We  first  give  a  brief  review  of  the  wavelet 
transform  and  wavelet  packet  transform  in  section  II. 
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The  proposed  feature  selection  methods  are 
described  in  section  IIL  The  experimental  results  on 
vistex  textures  and  the  sidescan  sonar  images  are 
reported  in  section  IV.  Finally,  We  draw  the 
conclusion  in  section  V, 

II.  WAVELET  AND  WAVELET 
PACKET  TRANSFORM 

For  simplicity,  a  one  dimensional  discrete  signal 
f(k)  of  length  n  -  2'*^  is  used  in  this  section.  The 
wavelet  transform  can  be  thought  of  as  a  smooth 
partition  of  the  signal  frequency  axis.  First,  a 
lowpass  filter  h(m)  and  a  highpass  filter  g(m)  of 
length  M  are  used  to  filter  the  signal  into  two 
subbands,  which  are  then  downsampled  by  a  factor 
of  two.  Let  //  and  G  be  the  convolution¬ 
downsampling  operators  defined  as: 

Hf{k)  =  ^h(m)f(2k  +  m)  ,  (1) 

m  =  0 

M-l 

Gf{k)  =  ^g(m)f(2k  +  m)  .  (2) 

m  =  0 

H  and  G  are  called  perfect  reconstruction 
quadrature  mirror  filters  (QMFs),  if  they  satisfy  the 
following  orthogonality  conditions: 


HG*  =  GH*  =  0 , 

(3) 

H*H  +  G*G  =  I, 

(4) 

where  H*  and  G*  are  the  adjoint  (i.e.,  upsampling-an¬ 
ticonvolution)  operators  of  H  and  G  respectively,  and 
I  is  the  identity  operator.  This  filtering  and  downsam¬ 
pling  process  is  continued  iteratively  on  the  low-fre¬ 
quency  subbands.  At  each  level  of  the  process,  the 
high-frequency  subband  is  preserved.  When  the  pro¬ 
cess  reaches  the  highest  decomposition  level,  both  the 
low-  and  high-frequency  bands  are  kept.  If  the  maxi¬ 
mum  processing  level  is  L,  the  discrete  wavelet  coeffi¬ 
cients  of  signal /f/:)  are  then  {G/  GHf,  GH^, 

of  the  same  length  n  as  the  original  signal. 
Due  to  the  orthogonality  conditions  of  H  and  G,  each 
level  of  decomposition  can  be  considered  as  a  decom¬ 
position  of  the  vector  space  into  two  mutually  orthog¬ 
onal  subspaces.  Let  Vq  q  denote  the  original  vector 
space  and  V;  ^  and  Vi  j  be  the  mutually  orthogo¬ 
nal  subspaces  generated  by  applying  H  and  G  to  Vq  q. 
Then,  the  Ith  level  of  decomposition  can  be  written  as 
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Figure  1.  Standard  wavelet  transform  binary  tree. 


Figure  2.  Wavelet  packet  transform  binary  tree. 

^0  =  1,0®^/.  1.1’  (5) 

for  I  =  0,  L.  Figure  1  shows  such  a 

decomposition  process.  Each  subspace  ^  with  b-0 

or  1  is  spanned  by  2^^'^  wavelet  basis  vectors 
b  ^  which  can  be  derived  from  //,  G, 

and  their  adjoint  operators. 

From  the  above  iterative  filtering  operations,  we 
can  see  that  the  wavelet  transform  partitions  the 


frequency  axis  finely  toward  the  lower  frequency 
region.  It  is  suitable  for  a  smooth  signal  containing 
primarily  low  frequency  energy,  but  not  necessarily 
appropriate  for  other  more  general  types  of  signals, 
such  as  textures.  The  wavelet  packet  transform  is  a 
much  more  generalized  form  of  the  standard  wavelet 
transform.  It  decomposes  both  the  high-  and  low- 
frequency  bands  at  each  iteration.  Like  the  wavelet 
transform,  two  subbands,  Hf  and  Gf,  are  generated  in 
the  first  level  of  decomposition.  However,  the  second 

level  process  generates  four  subbands,  GHf, 

HGf  and  G^f,  instead  of  the  two  bands  and  GHf 
in  the  wavelet  transform.  If  the  process  is  repeated  L 
times,  Ln  wavelet  packet  coefficients  are  obtained.  In 
orthogonal  subspace  representation,  the  /th  level  of 
decomposition  is 

“  ^/+l,2b®  ^/+l,2fc+l  j  (^) 

where  /  =  0,  7,...,  L  is  the  level  index  and  b  =  0,...,  2^- 
7  is  the  channel  block  index  in  each  level.  Figure  2 
illustrates  the  wavelet  packet  decomposition  of  the 
original  vector  space  Vq  q.  Again,  each  subspace  Vi  ij 

is  spanned  by  2^^'^  basis  vectors  ^=0  • 

For  b  =  0  and  1,  W  can  be  identified  with  \|/. 

As  for  two-dimensional  images,  the  wavelet  or 
wavelet  packet  basis  function  can  be  expressed  by 
the  tensor  product  of  two  one-dimensional  basis 
functions  in  the  horizontal  and  vertical  directions. 
The  corresponding  2-D  filters  are  thus: 


n)  =  h{m)h(n)  (7) 

=  h{m)g{n)  (8) 

hQ^{m,n)  =g{in)hin)  (9) 

hcG(m,n)  =  g(,m)g(n)  (10) 


In  Fig.  3,  we  show  three  sample  textures  and  their 
wavelet  packet  coefficients  for  levels  1  to  4. 

III.  METHODOLOGY 

We  develop  our  algorithm  by  addressing  the  three 
main  issues  of  multichannel  texture  classification:  1) 
feature  extraction  within  each  channel,  2)  channel 
selection,  and  3)  channel  relationships  and  feature 
combination  among  channels. 

Firstly,  since  the  wavelet  coefficients  are  shift 
variant,  they  are  not  suitable  for  direct  use  as  texture 


Figure  3.  Three  sample  textures  (row  1)  and  their 
wavelet  packet  coefficients  at  decomposition  levels 
1,  2,  3,  and  4  (rows  2-5). 


features.  It  is  important  to  extract  shift-invariant 
features  within  each  channel.  We  choose  to  test  the 
following  shift  invariant  measurements: 


where,  x(i.j)  denotes  an  element  of  the  wavelet 
packet  coefficient  matrix  x  in  each  channel.  To  make 


our  algorithm  suitable  for  sidescan  sonar  images,  the 
texture  sample  mean  is  removed  before  a  feature 
vector  is  computed.  Since  the  sidescan  sonar  image 
is  usually  cross-track  range  dependent  even  after  the 
best  effort  to  apply  angle  varying  gain  correction. 
Thus,  the  mean  feature  in  equation  (11)  becomes 
zero.  So  the  four  features  we  use  in  our  experiment 
are:  1)  variance  feature  VAR  with  k  =  2  in  (12),  2) 
the  entropy  feature  ENT  in  equation  (13),  3)  the  third 
momentum  MNT3,  and  4)  the  fourth  momentum 
MNT4. 

Note  that  the  orthogonality  condition  of  the 
wavelet  transform  means  that  the  decomposition  will 
preserve  energy.  Thus  for  the  variance  feature,  the 
following  relation  holds  for  any  node  and  its  children 
nodes: 


vector,  such  as  features  in  a  higher  level  wavelet 
packet  decomposition  and  the  Fourier  transform 
features,  the  computation  of  the  eigenvectors  of  the 
covariance  matrix  could  be  prohibitively  expensive. 
We  use  the  dominant  eigenvector  estimation  method 
described  in  [9]  [1  l]to  overcome  this  problem. 

However,  as  optimal  representation  features,  KLT 
selected  features  may  not  be  the  best  for 
classification.  Additional  feature  class  separability 
measurements  are  used  to  select  KLT  decorrelated 
features.  We  use  the  Bhattacharyya  distance 
measurement  in  this  study. 

The  reason  that  Bhattacharyya  distance  is  used  is 
its  direct  relation  to  the  error  bound  of  the  Bayes 
classifier  and  its  simple  form  for  features  with 
normal  distributions  [11]: 


^1.1,  =  \ 

l,4b+p’  (14) 

;  =  0 


(Cl,  Cj) 


g  (M-i  ^1^2) 


We  clearly  see  the  effect  of  this  linear  relationship  on 
the  classification  accuracy  of  overcomplete  wavelet 
packet  features  in  our  experiments  described  in  the 
next  section. 

After  the  features  are  computed  within  each 
channel,  the  second  issue  is  how  to  select  good 
features  among  channels.  One  possible  approach  is 
to  apply  a  statistical  distance  measure  to  each  feature 
and  selecting  the  features  with  large  distance 
measures.  However,  there  are  two  drawbacks  with 
this  approach.  The  first  is  that  neighborhood  channel 
features  tend  to  correlate  with  each  other.  Thus  they 
contain  similar  information.  If  one  has  large 
distance,  the  other  will  also  have  a  large  distance 
measure,  and  both  will  be  selected.  Thus  we  will 
keep  on  selecting  the  same  kind  of  features.  The 
second  problem  is  that  for  some  very  small  energy 
channels,  a  small  amount  of  unexpected  noise  may 
cause  the  result  of  a  distance  measure  to  be 
unrealistically  large,  which  will  be  selected  as  a  good 
channel.  To  avoid  these  problems,  we  propose  to 
combine  the  channel  selection  step  with  the  third 
step,  i.e.,  the  channel  combination,  into  one  feature 
selection  step  using  the  principal  component  analysis 
technique  and  the  statistical  distance  measurement. 

The  widely  used  Karhunen-Loeve  transform 
(KLT)  is  an  ideal  feature  reduction  and  selection 
procedure  for  our  algorithm.  Its  decorrelation  ability 
serves  to  decorrelate  neighborhood  channel  features, 
and  its  energy  packing  property  serves  to  remove 
noisy  channels  and  to  compact  useful  information 
into  a  few  dominant  features.  But  for  a  large  feature 


Because  of  the  large  number  of  combinations  of 
several  features  and  the  probability  of  covariance 
matrix  singularity,  computing  the  Bhattacharyya 
distance  for  several  features  at  once  is  not  a  practical 
approach.  The  one-at-a-time  method  is  adopted 
instead.  The  formula  is  the  same  as  Equation  (15), 
only  with  covariance  matrix  W  replaced  by  variance 
and  mean  vector  |Li  replaced  by  class  mean.  As  for 
multiclass  problems,  we  select  features  with  small 
values  of 
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In  the  next  section  we  test  our  algorithms  on  the 
following  group  of  features: 

1.  level  1:  VAR,  ENT,  MNT3,MNT4,  all, 

2.  level  2:  VAR,  ENT,  MNT3,  MNT4,  all, 

3.  level  3:  VAR,  ENT,  MNT3,  MNT4,  all, 

4.  level  4:  VAR,  all, 

5.  level  1&2:  VAR,  all, 

6.  level  1&2&3:  VAR,  all, 

7.  level  1&2&3&4:  VAR, 


8.  Wavelet:  VAR,  all, 

9.  FFT;  magnitude. 

The  goal  is  to  test  the  discrimination  power  of 
each  feature  type  in  each  individual  level,  the  effect 
of  overcomplete  representation,  and  the  classification 
power  of  the  standard  wavelet  transform.  We  also 
test  the  Fourier  transform  features,  which  can  be 
considered  as  an  extreme  case  of  the  wavelet  packet 
transform,  i.e.,  the  highest  possible  level  of  wavelet 
packet  decomposition.  This  fourier  transform  feature 
should  not  be  confused  with  the  traditional  power 
spectrum  method(PSM).  To  compare  the  difference 
between  our  approach  and  the  traditional  PSM,  refer 
to  [9]. 

The  classification  algorithm  used  in  this  study  is 
the  Gaussian  classifier.  There  are  two  reasons  for  this 
choice.  First,  it  agrees  with  the  above  error  bound 
defined  by  the  Bhattacharyya  distance.  Second,  with 
our  focus  on  feature  extraction,  we  choose  the 
simplest  classification  algorithm  available.  Again, 
we  assume  the  feature  vector  x  for  each  class  i  has  a 
Gaussian  distribution  with  mean  and  covariance 
matrix  W^.  Then,  the  distance  measure  is  defined  as 
[11] 


D,.  =  +ln[W;|  ,(17) 


where  the  first  term  on  the  right  of  the  equation  is 
actually  the  Mahalanobis  distance.  The  decision  rule 
is 


xe  when  =  mm{DJ  .(18) 


IV.  EXPERIMENTAL  RESULTS 
AND  DISCUSSION 

We  test  our  algorithms  on  a  set  of  eight  types  of 
natural  optical  images  obtained  from  the  MIT  Media 
lab  Vistex  texture  data  base  (see  Fig.  4).  The  original 
512x512  color  images  are  converted  to  the  same  size 
gray  scale  images  with  256  gray  levels.  Then, 
adaptive  histogram  equalization  is  applied.  So  all 
images  have  the  same  flat  histogram  and  are 
indistinguishable  from  each  other  in  terms  first  order 
statistics.  To  test  the  sensitivity  of  our  algorithms  to 
noise,  we  add  several  levels  of  white  noise  to  the 
data.  By  choosing  eight  classes  of  images,  doing 
histogram  flattening,  and  adding  noise,  we  try  to 
make  the  classification  task  more  difficult,  so  the 
difference  in  classification  ability  of  various  texture 


Figure  4.  Vistex  textures:  bark.0008,  brick.0004, 
buildings. 0009.  fabric.OOOL  fabric.0005, 
fabric.0013.  fabric.0017,  flowers.0007,  from  top  to 
bottom,  then  from  left  to  right. 

features  becomes  more  apparent.  We  then  select  the 
most  successful  methods  to  test  on  noisy,  real-world 
sidescan  sonar  images.  The  three  classes  of  sidescan 
sonar  texture  images  used  are  shown  in  Fig.  5.  They 
are  first-year  (young)  ice.  multiyear  undeformed  ice, 
and  multiyear  deformed  ice.  For  the  two  data  sets, 
each  class  of  image  is  divided  into  225  half¬ 
overlapping  samples  of  dimension  64x64,  of  which 
60  samples  are  used  for  training.  Therefore,  the  total 
data  sample  number  is  1800  for  Vistex  data  and  675 
for  the  sidescan  sonar  data  set.  with  480  and  180 


Figure  5.  Side  scan  sonar  images  of  an  Arctic  under¬ 
ice  canopy:  (a)  first-year  (young)  ice,  (b)  multiyear 
undeformed  ice,  and  (c)  multiyear  deformed  ice. 


samples  for  training,  respectively. 

Table  1  shows  the  complete  testing  results  from 
the  Vistex  data.  It  is  somewhat  overwhelming  to 
make  sense  of  these  large  amount  of  test  results 
directly  from  this  table.  We  point  out  only  a  few 
apparent  features  of  this  table,  then  use  a  few  plots 
of  the  results  from  the  table  to  illustrate  our  other 
findings. 

First,  notice  that  for  some  feature  groups,  the 
differences  in  classification  accuracy  between 
training  and  testing  data  are  very  large,  more  than 
50%  in  some  cases  (SNR  1  data).  In  fact,  except  for 
the  level  one  features,  which  have  only  four 
channels.  almost  all  other  training  data 
classifications  achieve  more  than  95%  accuracy, 
including  the  SNR  1  noisy  data.  This  is  not  the  case 
for  the  testing  data.  Since  only  the  simple  Gaussian 
classifier  is  used  here,  we  should  expect  these  trend 
be  even  more  apparent  for  the  more  sophisticated 


classifier,  which  can  learn  a  more  precise  feature 
stmcture  of  the  training  data.  The  significance  of  this 
result  is  that  it  shows  the  widely  used  leave-one-out 
testing  scheme  can  be  rather  deceiving  for  testing 
new  algorithms.  Since  leaving  one  sample  out  does 
not  affect  much  of  the  training  process.  In  the  case  of 
the  Gaussian  classifier,  the  effect  is  minimal.  This 
means  that  if  the  data  set  is  too  small,  the  results  will 
not  be  conclusive. 

Also  note  in  the  table  that  the  number  of  features 
used  to  achieve  best  results  for  each  group  of 
features  is  mostly  about  10.  The  difference  is  not  that 
large.  A  general  trend  is  that  noisier  data  tend  to 
need  more  features  to  reach  best  classification. 

To  help  focus  on  the  classification  accuracy  of  the 
overall  data  set.  Fig.  6  shows  the  comparison  of  the 
four  types  of  features  and  their  combinations  in  the 
first  three  decomposition  levels.  The  MNT3  feature 
is  the  worst  for  all  levels  and  all  data  sets.  It  is 
apparently  not  a  useful  measurement.  Entropy  also 
gives  much  less  satisfactory  results  than  the  variance 
feature,  and  the  classification  accuracy  drops  sharply 
for  noisy  data.  This  contradicts  the  result  given  in 
[8],  which  shows  that  energy  features  perform  only 
slightly  better  than  entropy  measures  (within  one 
percent).  The  MNT4  feature  seems  to  give  better 
results  than  the  above  two  features  but  is  still  less 
successful  than  the  variance  feature.  The 
performance  differences  between  the  MNT4  and  the 
variance  are  consistent  over  all  data  sets  and  all 
decomposition  levels,  which  is  because  they  are  very 
closely  correlated  features. 

The  observation  that  variance  features  perform 
better  than  other  features  is  consistent  with  Laws’  [1] 
experiment  with  features  extracted  from  empirical 
frequency  channels.  The  remaining  question  is 
whether  we  need  other  measurements  to  add  new 
information.  We  performed  a  union  operation  on  the 
correct  classification  samples  of  the  four  types  of 
features,  and  the  correct  classification  rate  increased 
by  about  5%,  nearing  100%  accuracy.  This 
demonstrates  that  each  feature  has  its  own  distinct 
classification  power.  By  combining  ail  features 
together,  we  -get  improved  result  for  the  lower 
decomposition  level.  Since  the  feature  length  is 
much  smaller  in  these  levels,  an  additional 
dimension  will  help  more  than  in  the  higher  level 
decomposition  case.  The  improvement  is  not  as 
impressive  as  the  union  of  results.  The  reason  is  that 
besides  the  new  information  additional  features  bring 
in,  there  is  also  a  great  deal  of  added  noise,  which 
may  overwhelm  the  benefit  of  additional  features. 
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Table  1:  Complete  test  results  on  eight  Vistex  texture  images. 
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Figure  6.  Comparison  of  the  four  types  of  features  in  the  first  three  individual  decomposition 
levels.  The  index  of  the  horizontal  axis  represent  signal-to-noise  ratio  (SNR)  level:  1.  original 
images,  2.  SNR  15dB,  3.  SNR  5dB,  4.  SNR  IdB. 


Figure  7.  Comparison  of  variance  features  for  individual  decomposition  levels,  overcomplete 
levels,  standard  wavelet,  and  Fourier  transform.  The  index  of  the  horizontal  axis  represents  the 
same  SNR  as  in  Fig.  6. 


We  now  look  in  detail  at  the  variance 
measurement  results  shown  in  Fig.  7.  From  Fig.  7(a), 
for  individual  levels,  the  general  trend  is  the  higher 
the  decomposition  level  the  better  the  result.  This  is 
predictable  from  Equation  (14),  which  shows  that  the 
lower  level  variance  features  are  simply  the  average 
of  their  higher  level  children  nodes.  A  KLT 
transform  will  do  better  than  such  a  simple  average 
operation  in  terms  of  extracting  maximum 
information.  To  confirm  this  point,  compare  Fig.  7 
(a)  and  (b).  It  is  easily  seen  that  the  following  pares 
of  results  are  almost  identical:  level  1&2  vs.  level  2, 
level  l&2cfe3  vs.  level  3,  level  1&2&3&4  vs.  level  4. 
This  means  that  lower  lever  features  are  only  a 
subset  of  higher  level  decomposition  features.  This  is 
contrary  to  what  Laine  and  Fan  suggested  in  [8],  that 
redundancy  may  provide  additional  discrimination 
power.  Our  experiments  show  that  better 
discrimination  ability  is  not  added  by 
overcompletion.  Instead,  it  is  extracted  by  applying 
KLT  to  higher  levels  of  finer  channel  decomposition, 
so  the  channel  nodes  are  combined  in  an  optimal 
way,  instead  of  by  simple  averaging. 

Continuing  further  along  this  path,  we  should 
expect  that  the  Fourier  transform  may  provide  even 
more  information.  Because  the  Fourier  transform  is 
really  the  extreme  case  of  a  wavelet  packet 
transform,  i.e.,  the  wavelet  packet  transform  at  its 
highest  possible  level.  Figure  7(c)  compares  the 
performances  of  three  levels  of  wavelet  packet 
decomposition,  the  standard  wavelet  transform,  and 
the  Fourier  transform.  The  Fourier  transform  indeed 
gives  a  consistently  better  performance  over  all  other 
feature  groups  on  all  levels  of  noisy  data  sets.  This 
result  should  not  come  as  a  surprise,  since  the 
wavelet  transform  is  optimal  for  nonstationary  signal 
analysis,  whereas  the  Fourier  transform  is  optimal 
for  stationary  signal  analysis.  Most  texture  images 
are  stationary  periodical  signals. 

Next,  notice  in  Fig.  7(c)  that  the  Fourier  transform 
and  other  higher  levels  of  wavelet  packet 
decompositions  are  very  insensitive  to  noise.  It  is 
surprising  to  see  that  more  than  95%  accuracy  is 
achieved  at  a  noise  level  of  IdB,  compared  with  the 
results  in  [7],  where  the  tree- structured  wavelet 
algorithm  collapses  to  70%  accuracy  at  a  5dB  noise 
level.  Noise  insensitivity  is  really  the  strength  of 
subband  image  processing.  Noise  usually  has  a  flat 
spectrum,  so  by  dividing  into  more  subbands,  the 
noise  energy  usually  decreases.  Yet  the  energy  of 
signals  tend  to  concentrate  in  a  small  number  of 
channels.  Therefore,  even  when  the  total  energy  of 
the  signal  and  noise  are  almost  the  same,  as  in  the 
case  of  our  testing  data  of  SNR  IdB,  the  signal-to- 


noise  ratio  will  be  much  higher  in  channels 
containing  most  of  the  signal  energy.  Our  feature 
selection  algorithms  are  designed  in  such  a  way  that 
they  pick  up  and  condense  the  signal  channel  with 
high  SNR  into  a  compact  representation  of  the  data, 
with  the  incoherent  noisy  channel  neglected. 

Finally,  we  test  our  algorithms  on  the  classification 
of  sidescan  sonar  images.  Only  the  feature  groups 
that  performed  best  in  the  above  experiment  are  used 
on  the  sonar  images.  Table  2  shows  the  results, 
which  are  consistent  with  the  Vistex  data  results. 
Although  the  image  class  number  is  smaller,  each 
class  of  image  is  noisy  and  nonuniform.  This  added 
difficulty  increases  the  accuracy  difference  between 
the  wavelet  packet  and  Fourier  transform  methods.  It 
again  shows  the  superiority  of  the  latter  approach. 

Table  2:  Classification  results  on  sidescan 
sonar  images. 


feature  group  names 

training 

testing 

all  data 

feature  num. 

1 

level  3  VAR 

93.3 

83.2 

85.9 

12 

2 

level  4  VAR 

96.1 

81.4 

85.3 

20 

3 

ail  levels  VAR 

98.3 

85.3 

88.7 

23 

4 

stand,  wavelet 

88.3 

88.1 

88.1 

7 

5 

FFTMag. 

98.9 

92.5 

94.2 

9 

V.  CONCLUSIONS 

Based  on  the  above  experiments,  the  following 
conclusions  are  drawn: 

1) .  Variance  (energy)  measurement  is  much  better 
than  entropy  and  higher  order  momentum.  But  there 
does  exist  additional  information  in  the  latter  three 
feamres  that  is  distinct  from  the  energy  information. 
Better  feature  selection  algorithms  have  yet  to  be 
developed  to  fully  employ  this  information. 

2) .  Higher  levels  of  decomposition  perform  better 
than  lower  levels.  This  leads  to  the  conclusion  that 
Fourier  transform  features  are  better  than  wavelet 
packet  features. 

3) .  For  variance  features,  overcomplete 


representation  does  not  give  better  results  than 
individual  level  features. 

4) .Wavelet  packet  features  are  very  insensitive  to 
noise.  Features  from  higher  levels  are  less  sensitive 
than  the  lower  level  wavelet  features.  This  noise 
insensitivity  property  makes  wavelet  packet  features 
suitable  for  sidescan  sonar  images,  which  are  usually 
noisy. 

5) .  The  KLT  and  Bhattacharyya  distance 
measurement  methods  are  good  feature  selection 
methods  to  use  on  wavelet  packet  features.  KLT  is 
necessary,  especially  for  higher  level  features. 

6) .  There  are  great  differences  between  the 
training  data  and  testing  data  classification  accuracy. 
It  casts  doubt  on  results  of  the  leave-one-out  testing 
strategy  used  by  many  texture  classification  works. 
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